AITopics | video demonstration

Collaborating Authors

video demonstration

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

One Demonstration is Enough to Learn Robot Policies

Neural Information Processing SystemsFeb-16-2026, 13:17:59 GMT

Sequential decision-making problems typically require significant human supervision and data.

demonstration, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.68)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Add feedback

One Demonstration is Enough to Learn Robot Policies

Neural Information Processing SystemsOct-9-2025, 04:43:03 GMT

Sequential decision-making problems typically require significant human supervision and data.

demonstration, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.68)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Add feedback

ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation

Tang, Nan, Pang, Jing-Cheng, Li, Guanlin, Qian, Chao, Yu, Yang

arXiv.org Artificial IntelligenceSep-29-2025

Reward design remains a critical bottleneck in visual reinforcement learning (RL) for robotic manipulation. In simulated environments, rewards are conventionally designed based on the distance to a target position. However, such precise positional information is often unavailable in real-world visual settings due to sensory and perceptual limitations. In this study, we propose a method that implicitly infers spatial distances through keypoints extracted from images. Building on this, we introduce Reward Learning with Anticipation Model (ReLAM), a novel framework that automatically generates dense, structured rewards from action-free video demonstrations. ReLAM first learns an anticipation model that serves as a planner and proposes intermediate keypoint-based subgoals on the optimal path to the final goal, creating a structured learning curriculum directly aligned with the task's geometric objectives. Based on the anticipated subgoals, a continuous reward signal is provided to train a low-level, goal-conditioned policy under the hierarchical reinforcement learning (HRL) framework with provable sub-optimality bound. Extensive experiments on complex, long-horizon manipulation tasks show that ReLAM significantly accelerates learning and achieves superior performance compared to state-of-the-art methods.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2509.22402

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.34)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration

Lum, Tyler Ga Wei, Lee, Olivia Y., Liu, C. Karen, Bohg, Jeannette

arXiv.org Artificial IntelligenceAug-19-2025

Teaching robots dexterous manipulation skills often requires collecting hundreds of demonstrations using wearables or teleoperation, a process that is challenging to scale. Videos of human-object interactions are easier to collect and scale, but leveraging them directly for robot learning is difficult due to the lack of explicit action labels and human-robot embodiment differences. We propose Human2Sim2Robot, a novel real-to-sim-to-real framework for training dexterous manipulation policies using only one RGB-D video of a human demonstrating a task. Our method utilizes reinforcement learning (RL) in simulation to cross the embodiment gap without relying on wearables, teleoperation, or large-scale data collection. From the video, we extract: (1) the object pose trajectory to define an object-centric, embodiment-agnostic reward, and (2) the pre-manipulation hand pose to initialize and guide exploration during RL training. These components enable effective policy learning without any task-specific reward tuning. In the single human demo regime, Human2Sim2Robot outperforms object-aware replay by over 55% and imitation learning by over 68% on grasping, non-prehensile manipulation, and multi-step tasks. Website: https://human2sim2robot.github.io

artificial intelligence, demonstration, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2504.12609

Country: North America > United States (0.67)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.84)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

A Human-in-the-loop Approach to Robot Action Replanning through LLM Common-Sense Reasoning

Merlo, Elena, Lagomarsino, Marta, Ajoudani, Arash

arXiv.org Artificial IntelligenceJul-29-2025

To facilitate the wider adoption of robotics, accessible programming tools are required for non-experts. Observational learning enables intuitive human skills transfer through hands-on demonstrations, but relying solely on visual input can be inefficient in terms of scalability and failure mitigation, especially when based on a single demonstration. This paper presents a human-in-the-loop method for enhancing the robot execution plan, automatically generated based on a single RGB video, with natural language input to a Large Language Model (LLM). By including user-specified goals or critical task aspects and exploiting the LLM common-sense reasoning, the system adjusts the vision-based plan to prevent potential failures and adapts it based on the received instructions. Experiments demonstrated the framework intuitiveness and effectiveness in correcting vision-derived errors and adapting plans without requiring additional demonstrations. Moreover, interactive plan refinement and hallucination corrections promoted system robustness.

demonstration, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.2087

Country: Europe (0.46)

Genre:

Workflow (0.89)
Research Report (0.84)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Multi-step manipulation task and motion planning guided by video demonstration

Zorina, Kateryna, Kovar, David, Fourmy, Mederic, Lamiraux, Florent, Mansard, Nicolas, Carpentier, Justin, Sivic, Josef, Petrik, Vladimir

arXiv.org Artificial IntelligenceMay-15-2025

--This work aims to leverage instructional video to solve complex multi-step task-and-motion planning tasks in robotics. T owards this goal, we propose an extension of the well-established Rapidly-Exploring Random Tree (RRT) planner, which simultaneously grows multiple trees around grasp and release states extracted from the guiding video. Our key novelty lies in combining contact states and 3D object poses extracted from the guiding video with a traditional planning algorithm that allows us to solve tasks with sequential dependencies, for example, if an object needs to be placed at a specific location to be grasped later . We also investigate the generalization capabilities of our approach to go beyond the scene depicted in the instructional video. T o demonstrate the benefits of the proposed video-guided planning approach, we design a new benchmark with three challenging tasks: (i) 3D re-arrangement of multiple objects between a table and a shelf, (ii) multi-step transfer of an object through a tunnel, and (iii) transferring objects using a tray similar to a waiter transfers dishes. We demonstrate the effectiveness of our planning algorithm on several robots, including the Franka Emika Panda and the KUKA KMR iiwa . For a seamless transfer of the obtained plans to the real robot, we develop a trajectory refinement approach formulated as an optimal control problem (OCP). Traditional robot motion planning algorithms seek a collision-free path from a given starting robot configuration to a given goal robot configuration [1]. Despite the large dimensionality of the configuration space, sampling-based motion planning algorithms [2], [3] have shown to be highly effective for solving complex motion planning problems for robots, ranging from six degrees of freedom (DoFs) for industrial manipulators to tens of DoFs for humanoids [4]. Manipulation task-and-motion planning (T AMP) [5] adds an additional complexity to the problem by including movable objects in the state space. This requires the planner to discover the pick-and-place actions that connect the given start and goal robot configurations, bringing the manipulated objects from their start poses to their goal poses. INRIA, Paris This work is part of the AGIMUS project, funded by the European Union under GA no.101070165. Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission.

artificial intelligence, configuration, demonstration, (18 more...)

arXiv.org Artificial Intelligence

2505.08949

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Europe > Czechia > Prague (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Education > Educational Technology (0.95)
Government > Regional Government > Europe Government (0.74)

Technology: Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)

Add feedback

MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos

Wang, Xianghui, Zhang, Xinming, Chen, Yanjun, Shen, Xiaoyu, Zhang, Wei

arXiv.org Artificial IntelligenceMay-14-2025

Vision-language models (VLMs) have demonstrated excellent high-level planning capabilities, enabling locomotion skill learning from video demonstrations without the need for meticulous human-level reward design. However, the improper frame sampling method and low training efficiency of current methods remain a critical bottleneck, resulting in substantial computational overhead and time costs. To address this limitation, we propose Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos (MA-ROESL). MA-ROESL integrates a motion-aware frame selection method to implicitly enhance the quality of VLM-generated reward functions. It further employs a hybrid three-phase training pipeline that improves training efficiency via rapid reward optimization and derives the final policy through online fine-tuning. Experimental results demonstrate that MA-ROESL significantly enhances training efficiency while faithfully reproducing locomotion skills in both simulated and real-world settings, thereby underscoring its potential as a robust and scalable framework for efficient robot locomotion skill learning from video demonstrations.

large language model, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2505.08367

Country:

North America (0.46)
Asia > China (0.29)

Genre: Research Report (0.84)

Industry: Education (0.71)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (0.48)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Add feedback

OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation

Li, Jinhan, Zhu, Yifeng, Xie, Yuqi, Jiang, Zhenyu, Seo, Mingyo, Pavlakos, Georgios, Zhu, Yuke

arXiv.org Artificial IntelligenceOct-15-2024

We study the problem of teaching humanoid robots manipulation skills by imitating from single video demonstrations. We introduce OKAMI, a method that generates a manipulation plan from a single RGB-D video and derives a policy for execution. At the heart of our approach is object-aware retargeting, which enables the humanoid robot to mimic the human motions in an RGB-D video while adjusting to different object locations during deployment. OKAMI uses open-world vision models to identify task-relevant objects and retarget the body motions and hand poses separately. Our experiments show that OKAMI achieves strong generalizations across varying visual and spatial conditions, outperforming the state-of-the-art baseline on open-world imitation from observation. Furthermore, OKAMI rollout trajectories are leveraged to train closed-loop visuomotor policies, which achieve an average success rate of 79.2% without the need for labor-intensive teleoperation. More videos can be found on our website https://ut-austin-rpl.github.io/OKAMI/.

artificial intelligence, robot, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2410.11792

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (1.00)

Add feedback

One-shot Video Imitation via Parameterized Symbolic Abstraction Graphs

Wang, Jianren, Liu, Kangni, Guo, Dingkun, Zhou, Xian, Atkeson, Christopher G

arXiv.org Artificial IntelligenceAug-22-2024

Learning to manipulate dynamic and deformable objects from a single demonstration video holds great promise in terms of scalability. Previous approaches have predominantly focused on either replaying object relationships or actor trajectories. The former often struggles to generalize across diverse tasks, while the latter suffers from data inefficiency. Moreover, both methodologies encounter challenges in capturing invisible physical attributes, such as forces. In this paper, we propose to interpret video demonstrations through Parameterized Symbolic Abstraction Graphs (PSAG), where nodes represent objects and edges denote relationships between objects. We further ground geometric constraints through simulation to estimate non-geometric, visually imperceptible attributes. The augmented PSAG is then applied in real robot experiments. Our approach has been validated across a range of tasks, such as Cutting Avocado, Cutting Vegetable, Pouring Liquid, Rolling Dough, and Slicing Pizza. We demonstrate successful generalization to novel objects with distinct visual and physical properties.

demonstration, learning, video, (13 more...)

arXiv.org Artificial Intelligence

2408.12674

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)
Europe > Greece (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

VIEW: Visual Imitation Learning with Waypoints

Jonnavittula, Ananth, Parekh, Sagar, Losey, Dylan P.

arXiv.org Artificial IntelligenceApr-27-2024

Robots can use Visual Imitation Learning (VIL) to learn everyday tasks from video demonstrations. However, translating visual observations into actionable robot policies is challenging due to the high-dimensional nature of video data. This challenge is further exacerbated by the morphological differences between humans and robots, especially when the video demonstrations feature humans performing tasks. To address these problems we introduce Visual Imitation lEarning with Waypoints (VIEW), an algorithm that significantly enhances the sample efficiency of human-to-robot VIL. VIEW achieves this efficiency using a multi-pronged approach: extracting a condensed prior trajectory that captures the demonstrator's intent, employing an agent-agnostic reward function for feedback on the robot's actions, and utilizing an exploration algorithm that efficiently samples around waypoints in the extracted trajectory. VIEW also segments the human trajectory into grasp and task phases to further accelerate learning efficiency. Through comprehensive simulations and real-world experiments, VIEW demonstrates improved performance compared to current state-of-the-art VIL methods. VIEW enables robots to learn a diverse range of manipulation tasks involving multiple objects from arbitrarily long video demonstrations. Additionally, it can learn standard manipulation tasks such as pushing or moving objects from a single video demonstration in under 30 minutes, with fewer than 20 real-world rollouts. Code and videos here: https://collab.me.vt.edu/view/

artificial intelligence, machine learning, robot, (15 more...)

arXiv.org Artificial Intelligence

2404.17906

Country: North America > United States > Virginia (0.04)

Genre:

Research Report > New Finding (0.93)
Instructional Material > Course Syllabus & Notes (0.67)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback